11 research outputs found
Optimistic distributionally robust optimization for nonparametric likelihood approximation
The likelihood function is a fundamental component in Bayesian statistics. However, evaluating the likelihood of an observation is computationally intractable in many applications. In this paper, we propose a non-parametric approximation of the likelihood that identifies a probability measure which lies in the neighborhood of the nominal measure and that maximizes the probability of observing the given sample point. We show that when the neighborhood is constructed by the Kullback-Leibler divergence, by moment conditions or by the Wasserstein distance, then our optimistic likelihood can be determined through the solution of a convex optimization problem, and it admits an analytical expression in particular cases. We also show that the posterior inference problem with our optimistic likelihood approximation enjoys strong theoretical performance guarantees, and it performs competitively in a probabilistic classification task
Semi-supervised Learning based on Distributionally Robust Optimization
We propose a novel method for semi-supervised learning (SSL) based on
data-driven distributionally robust optimization (DRO) using optimal transport
metrics. Our proposed method enhances generalization error by using the
unlabeled data to restrict the support of the worst case distribution in our
DRO formulation. We enable the implementation of our DRO formulation by
proposing a stochastic gradient descent algorithm which allows to easily
implement the training procedure. We demonstrate that our Semi-supervised DRO
method is able to improve the generalization error over natural supervised
procedures and state-of-the-art SSL estimators. Finally, we include a
discussion on the large sample behavior of the optimal uncertainty region in
the DRO formulation. Our discussion exposes important aspects such as the role
of dimension reduction in SSL
Calculating optimistic likelihoods using (geodesically) convex optimization
A fundamental problem arising in many areas of machine learning is the evaluationof the likelihood of a given observation under different nominal distributions.Frequently, these nominal distributions are themselves estimated from data, whichmakes them susceptible to estimation errors. We thus propose to replace eachnominal distribution with an ambiguity set containing all distributions in its vicinityand to evaluate anoptimistic likelihood, that is, the maximum of the likelihoodover all distributions in the ambiguity set. When the proximity of distributionsis quantified by the Fisher-Rao distance or the Kullback-Leibler divergence, theemerging optimistic likelihoods can be computed efficiently using either geodesicor standard convex optimization techniques. We showcase the advantages ofworking with optimistic likelihoods on a classification problem using synthetic aswell as empirical data
Calculating optimistic likelihoods using (geodesically) convex optimization
33rd Conference on Neural Information Processing Systems (NeurIPS 2019), 8-14 Dec 2019, Vancouver, Canada202305 bcchVersion of RecordSelf-fundedPublishe
Optimistic distributionally robust optimization for nonparametric likelihood approximation
33rd Conference on Neural Information Processing Systems (NeurIPS 2019), 8-14 Dec 2019, Vancouver, Canada202305 bcchVersion of RecordOthersEPSRCPublishe
Wasserstein Distributionally Robust Optimization: Theory and Applications in Machine Learning
Many decision problems in science, engineering and economics are affected by uncertain parameters whose distribution is only indirectly observable through samples. The goal of data-driven decision-making is to learn a decision from finitely many training samples that will perform well on unseen test samples. This learning task is difficult even if all training and test samples are drawn from the same distribution - especially if the dimension of the uncertainty is large relative to the training sample size. Wasserstein distributionally robust optimization seeks data-driven decisions that perform well under the most adverse distribution within a certain Wasserstein distance from a nominal distribution constructed from the training samples. In this tutorial we will argue that this approach has many conceptual and computational benefits. Most prominently, the optimal decisions can often be computed by solving tractable convex optimization problems, and they enjoy rigorous out-of-sample and asymptotic consistency guarantees. We will also show that Wasserstein distributionally robust optimization has interesting ramifications for statistical learning and motivates new approaches for fundamental learning tasks such as classification, regression, maximum likelihood estimation or minimum mean square error estimation, among others